Predicting C. difficile infection severity from the taxonomic composition of the gut microbiome
Kelly L. Sovacool1, Sarah E. Tomkovich2, Megan L. Coden4, Vincent B. Young2,4, Krishna Rao4, Patrick D. Schloss2,5
1 Department of Computational Medicine & Bioinformatics, University of Michigan
2 Department of Microbiology & Immunology, University of Michigan
3 Department of Molecular, Cellular, and Developmental Biology, University of Michigan
4 Division of Infectious Diseases, Department of Internal Medicine, University of Michigan
5 Center for Computational Medicine and Bioinformatics, University of Michigan
Introduction
- C. difficile infection (CDI) can lead to adverse outcomes including recurrent
infections, colectomy, and death.
- The composition of the gut microbiome plays an important role in
determining colonization resistance and clearance when exposed to
C. difficile.
- We have 16S amplicon sequence data from CDI patient stool samples, with
TODO samples classified as severe CDI and
TODO as not severe according to the Infectious Diseases
Society of America (IDSA) definition.
- IDSA defines severe CDI cases based on a white blood cell count ≥ 15 k/μL
and serum creatinine level ≥ 1.5 mg/dL.
Methods
- Sequences were processed with mothur according to the MiSeq SOP and clustered
into de novo OTUs at a 3% distance threshold.
- We then trained machine learning (ML) models with OTU abundances as features to
predict the IDSA severity of CDI cases using the mikropml R package.
- The dataset was randomly split into training and testing sets with 80% of the
data in the training set, then models were trained with 5-fold cross-validation
repeated 100 times, and performance as the area under the receiver-operator
curve (AUROC) was measured on the testing set for the best model.
- This was repeated for 100 different random seeds and three different ML methods:
logistic regression, random forest, and support vector machines with a radial
basis kernel.
Results
- This process yielded median AUROC values of TODO for logistic
regression, TODO for random forest, and TODO for
support vector machines.
- Feature importance was determined with a permutation test for the best random
forest model, revealing that the top 5 OTUs that contributed the most to model
performance were TODO.
Conclusions
- The modest performance may be improved in future work by training to predict
clinically confirmed adverse patient outcomes rather than IDSA severity, such as
recurrence, admission to intensive care, colectomy, or death.
- Predicting a patient’s risk of experiencing a severe CDI and identifying the
specific microbiome features that distinguish severe CDI cases will allow
clinicians to tailor interventions based on each patient’s individual
microbiome, ultimately leading to better health outcomes.
Acknowledgements
This research was supported by National Institutes of Health grants U01AI124255
and the Michigan Institute for Clinical and Health Research Postdoctoral
Translational Scholars Program (UL1TR002240 from the National Center for
Advancing Translational Sciences).
References